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Abstract — In workflow scheduling in the context of Clusters 
and Grids typically ignore costs related to utilization of the 
infrastructure, and also have limitations in the capacity of 
taking advantage of elastic infrastructures. Existing research in 
execution of scientific workflows in Clouds either tries to 
minimize the workflow execution time ignoring deadlines and 
budgets or focus on the minimization of cost while trying to meet 
the application deadline. We will also examine how the 
replication-based advance can be used when the provisioning 
and preparation procedure is performed for numerous 
workflows whose requirements get there at dissimilar charge. 
The proposed Enhanced IC-PCP with Replication (EIPR) 
algorithm is increasing the likelihood of completing the 
execution of a scientific workflow application within a 
user-defined deadline in a public Cloud environment, which 
typically offers high availability but significant performance 
variation, with the use of task replication, two assets with the 
same individuality may have dissimilar presentation in a given 
time, what results in difference in the execution time of tasks 
that may lead to delays in the workflow completing. To decrease 
the force of presentation difference of public Cloud possessions 
in the deadlines of workflows. The EIPR algorithm increases the 
chance of deadlines being met and reduces the total execution 
time of workflows as the plan accessible for duplication 
decreases. 

Index Terms — Cloud computing, scientific workflows, task 
replication, soft deadline. 


I. INTRODUCTION 

AMONG the programming paradigms available for 
development of scientific applications, the workflow model is 
extensively applied in diverse areas such as astronomy, 
bioinformatics, and physics. Scientific workflows are 
described as direct acyclic graphs (DAGs) whose nodes 
represent tasks and vertices represent dependencies among 
tasks. Because a single workflow can contain hundreds or 
thousands of tasks, this type of application can benefit of 
large-scale infrastructures. Among such infrastructures, the 
one of special interest in this paper is public Cloud. This is 
because these infrastructures are available in a pay-per-use 
system and can provide dynamic scaling in response to the 
needs of the application (a propriety known as elasticity). 

Therefore, resources for execution of the workflow can be 
provisioned on demand, and their number can be increased if 
there is enough budget to support it. This Cloud utilization 
model, where users obtain hardware resources such as virtual 
machines where they deploy their own applications, is called 
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Infrastructure as a ServiceVlaaS. For the sake of simplicity, 
throughout this paper we refer Cloud IaaS providers as Cloud 
providers. This capability of Clouds makes them a suitable 
platform to host deadline-constrained scientific workflows. 

II. RELATED WORK 

A. SCHEDULING REAL-TIME TRANSACTION 

Abbott.R.et al, [1] discussed that show 
transactions in a debases system can have real-time 
constraints. Consider for example program trading, or the use 
of computer programs to initiate trades in a financial market 
with little or no human intervention. A financial market is a 
complex Process whose state is partially captured by variables 
such as current stock prices, changes in stock prices, volume 
of trading, trends, and composite indexes. These variables 
and others can be stored and organized in a database to model 
a financial market .One type of process in this system. 

B. DEADLINE-CONSTRAINEDWORKFLOW 

SCHEDULING ALGORITHMS 

Abrishami.S.et al, [2] discussed that show the advent of cloud 
computing as a new model of service provisioning in 
distributed systems encourages researchers to investigate its 
benefits and drawbacks on executing scientific applications 
such as workflows. One of the most challenging problems in 
Clouds is workflow scheduling, the problem of satisfying the 
QoS requirements of the user as well as minimizing the cost of 
workflow execution. It have previously designed and 
analyzed a two-phase scheduling algorithm for utility Grids, 
called Partial Critical Paths (PCP), which aims to minimize 
the cost of workflow execution while meeting a user-defined 
deadline. However, it believe Clouds are different from utility 
Grids in three ways on-demand resource provisioning, 
homogeneous networks, and the pay-as-you-go pricing 
model. In this paper adapt the PCP algorithm for the Cloud 
environment and propose two workflow scheduling. 

Algorithms a one-phase algorithm which is called IaaS Cloud 
Partial Critical Paths, and a two-phase algorithm which is 
called IaaS Cloud Partial Critical Paths with Deadline 
Distribution. Both algorithms have a polynomial time 
complexify which make them suitable options for scheduling 
large workflows. 

C. WORKFLOW SCHEDULING ALGORITHMS FOR 

GRID COMPUTING 

Buyya.J.et al, [3] discussed that show workflow scheduling is 
one of the key issues in the management of workflow 
execution. Scheduling is a process that maps and manages 
execution of inter-dependent tasks on distributed resources. It 
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introduces allocating suitable resources to workflow tasks so 
that the execution can be completed to satisfy objective 
functions specified by users. Proper scheduling can have 
significant impact on the performance of the system. In this 
chapter, investigate existing workflow scheduling algorithms 
developed and deployed by various Grid projects. 

D. CLOUDSIM A TOOLKIT FOR MODELING AND 

SIMULATION OF CLOUD 

Calheiros.R.N.et al, [4] discussed that show not possible to 
perform benchmarking experiments in repeatable, 
dependable, and scalable environment using real-world 
Cloud. Considering that none of the current distributed system 
simulators offer the environment that can be used for 
modelling Cloud, present CloudSim. Evaluating federated 
cloud computing components. The next set of experiments 
aimed at testing CloudSim’s components that form the basis 
for modelling and simulation of a federated network of 
clouds. Evaluated a straightforward load-migration policy 
that performed online migration of VMs across federated 
cloud providers in case the origin provider did not have the 
requested number of free VM slots available. Create a VM 
instance that had the same configuration as the original VM 
and which was also compliant with the destination provider 
configurations. Migrate the cloudlets assigned to the original 
VM to the newly instantiated VM. Evaluating federated cloud 
computing components. Data center in the federated network 
50 hosts, 10GB RAM, 2TB storage, processor with 
1000MIPS Time shared VM scheduler. User requested 25 
VMs 256MB of memory, 1GB of storage, CPU Time-shared 
Cloudlet scheduler Length of Cloudlets 1, 800, 000 Mis. The 
simulation results show that both algorithms have a promising 
performance, with IC-PCP performing better than IC-PCPD2 
in most cases. 

Two setup with federated network without federated network. 
Plan to add new pricing and provisioning policies to 
CloudSim. Others like Models for database services such as 
blob, QoS monitoring capability at VM Pricing models for 
public Clouds. Furthermore, the geographical region where 
the data center is hosted influences total power bills. Like 
locations where power cost is low and has less hostile weather 
conditions. 

E. WORKFLOWSIM: A TOOLKIT FOR SIMULATING 

SCIENTIFIC WORKFLOWS 

Chen.W. et al, [5] discussed that show simulation is one of the 
most popular evaluation methods in scientific workflow 
studies. However, existing workflow simulators fail to 
provide a framework that takes into consideration 
heterogeneous system overheads and failures. Also lack the 
support for widely used workflow optimization techniques 
such as task clustering. Introduce workflowSim, which 
extends the existing CloudSim simulator by providing a 
higher layer of workflow management. Also indicate that to 
ignore system overheads and failures in simulating scientific 
workflows could cause significant inaccuracies in the 
predicted workflow runtime. To further validate its value in 
promoting other research work, introduce two promising 
research areas for which WorkflowSim provides a unique and 
effective evaluation platform. 


F. SCHEDULING FAULT-TOLERANT DISTRIBUTED 

HARD REAL-TIME TASKS 

Chevochot.P. et al, [6] discussed that show replication is a 
well-known fault tolerance technique, and several replication 
strategies exist. To be used in hard real-time systems, the 
presence of replication must be dealt with in scheduling 
algorithms, and more particularly in the feasibility tests in 
charge of testing whether deadlines will be met or not. So far 
existing solutions to integrate replicated tasks in scheduling 
algorithms were specific to a given replication strategy or to 
its implementation on a given architecture. This paper is 
devoted to the description of a framework for taking into 
account the replicated tasks in scheduling algorithms that is 
largely independent of the replication technique. Show on an 
example that the same scheduling algorithm can be used 
whatever replication strategy is selected, even if several 
replication strategies are simultaneously used. 

G. ON THE EFFICACY, EFFICIENCY AND 

EMERGENT BEHAVIOR OF TASK 

Cime.W. et al, [7] discussed that show large distributed 
systems challenge traditional schedulers, as it is often hard to 
determine a priori how long each task will take to complete 
on each resource, information that is input for such 
schedulers. Task replication has been applied in a variety of 
scenarios as a way to circumvent this problem. Task 
replication consists of dispatching multiple replicas of a task 
and using the result from the first replica to finish. Replication 
schedulers are able to achieve good performance even in the 
absence of information on tasks and resources. It are also of 
smaller complexity than traditional schedulers, making them 
better suitable for large distributed systems. On the other 
hand, replication schedulers waste cycles with the replicas 
that are not the first to finish. Moreover, this extra 
consumption of resources raises severe concerns about the 
system-wide performance of a distributed system with 
multiple, competing replication schedulers. This paper 
presents a comprehensive study of task replication, 
comparing replication schedulers against traditional 
information-based schedulers, and establishing their efficacy, 
efficiency, and emergent behaviour introduce a simple access 
control strategy that can be implemented locally by each 
resource and greatly improves overall performance of a 
system on which multiple replication schedulers compete for 
resources. 

H. DYNAMIC LOAD BALANCING AND JOB 

REPLICATION IN A GLOBAL 

Dobber.M. et al, [8] discussed that show global-scale grids 
provide a massive source of processing power, providing the 
means to support processor intensive parallel applications. 
The strong burstiness and unpredictability of the available 
processing and network resources raise the strong need to 
make applications robust against the dynamics of grid 
environments. The two main techniques that are most suitable 
to cope with the dynamic nature of the grid are dynamic load 
balancing and job replication. Analyse and compare the 
effectiveness of these two approaches by means of 
trace-driven simulations. Observe that there exists an 
easy-to-measure statistic Y and a corresponding threshold 
value Y*, such that DLB consistently outperforms JR when Y 
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> Y*, whereas the reverse is true. Based on this observation, 
propose a simple and easy-to-implement approach, 
throughout referred to as the DLB/JR method, that can make 
dynamic decisions about whether to use DLB or JR. 


I. MULTIPLE WORKFLOW SCHEDULING 

STRATEGIES WITH USER RUN TIME 

Hirales-Carbajal A. et al, [9] discussed that show present an 
experimental study of deterministic non-preemptive multiple 
workflow scheduling strategies on a Grid. Distinguish twenty 
five strategies depending on the type and amount of 
information they require. Analyse scheduling strategies that 
consist of two and four stages: labeling, adaptive allocation, 
prioritization, and parallel machine scheduling. Apply these 
strategies in the context of executing the Cybershake, 
Epigenomics, Genome, Inspiral, LIGO, Montage, and sipht 
workflows applications. In order to provide performance 
comparison, performed a joint analysis considering three 
metrics. A case study is given and corresponding results 
indicate that well known DAG scheduling algorithms 
designed for single DAG and single machine settings are not 
well suited for Grid scheduling scenarios, where user run time 
estimates are available. It show that the proposed new 
strategies outperform other strategies in terms of 
approximation factor, mean critical path waiting time, and 
critical path slowdown. The robustness of these strategies is 
also discussed. 

J. STATIC SCHEDULING ALGORITHMS FOR 

ALLOCATING DIRECTED TASK 

Kwok Y.K. et al, [10] discussed that show static scheduling of 
a program represented by a directed task graph on a 
multiprocessor system to minimize the program completion 
time is a well-known problem in parallel processing. Since 
finding an optimal schedule is an NP-complete problem in 
general, researchers have resorted to devising efficient 
heuristics. A plethora of heuristics have been proposed based 
on a wide spectrum of techniques, including 
branch-and-bound, integer programming, searching, 
graph-theory, randomization, genetic algorithms, and 
evolutionary methods. The objective of this survey is to 
describe various scheduling algorithms and their 
functionalities in a contrasting fashion as well as examine 
their relative merits in terms of performance and 
time-complexity. Since these algorithms are based on diverse 
assumptions, they differ in their functionalities, and hence are 
difficult to describe in a unified context. Propose a taxonomy 
that classifies these algorithms into different categories. 
Scheduling algorithms with each algorithm explained through 
an easy-to-understand description followed by an illustrative 
example to demonstrate its operation.it also outline some of 
the novel and promising optimization approaches and current 
research trends in the area. 

K. MEETING SOFT DEADLINES IN SCIENTIFIC 

WORKFLOWS 

Plankensteiner.K. et al, [11] discussed that show propose a 
new heuristic called Resubmission Impact to support fault 
tolerant execution of scientific workflows in heterogeneous 
parallel and distributed computing environments. In contrast 
to related approaches, method can be effectively used on new 
or unfamiliar environments, even in the absence of historical 


executions or failure trace models. On top of this method, 
propose a dynamic enactment and rescheduling heuristic able 
to execute workflows with a high degree of fault tolerance, 
while taking into account soft deadlines. Simulated 
experiments of three real-world workflows in the austrian grid 
demonstrate that method significantly reduces the resource 
waste compared to conservative task replication and 
resubmission techniques, while having a comparable makes 
pan and only a slight decrease in the success probability. On 
the other hand, the dynamic enactment method manages to 
successfully meet soft deadlines in faulty environments in the 
absence of historical failure trace information or models. 


HI. SYSTEM MODEL 

The problem addressed in this paper consists in the execution 
of a workflow G in the Cloud on or before dl(G) (i.e., 
deadline-constrained) at the smaller possible cost (i.e., 
cost-optimized). Furthermore, because the workflows are 
subject to a soft deadline, a bigger budget can be invested for 
execution of G if it increases the likelihood of the deadline 
being met. The extra budget is expected to be proportional to 
the importance of the application to complete by its deadline. 
For this problem to be solved, two sub problems have to be 
solved, namely provisioning and scheduling. The 
provisioning problem consists in the determination of the 
optimal number and type of VMs that can complete the 
workflow within its deadline. The scheduling problem 
consists in the determination of the placement and order of 
execution of the different tasks that compose the workflow in 
the VMs selected during the provisioning stage. 


[**] taz^j 



The provisioning and scheduling problems are 
interconnected, as a different decision in types and number of 
machines may result in a different scheduling of tasks. 

We assume that the workflow application executes 
in a single Cloud data center. Since more predictable 
execution and data transfer times are paramount for meeting 
application deadlines, keeping the workflow in a single data 
center eliminates one possible source of execution delay. It 
also eliminates the cost incurred by data transfer among data 
centers. We also ignore overheads incurred by the workflow 
management system. This is because they are strongly 
dependent on the particular technology for workflow 
management in use, varying from constant time (which could 
be modeled as additional execution time of each task) to 
cyclical regarding the number of tasks managed. 
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IV. PROBLEM STATEMENT 
To minimize the workflow execution time ignoring deadlines. 
Budgets or focus on the minimization of cost while trying to 
meet the application deadline. Task scheduling, Data transfer, 
Task replication. The following should be made, Task 
scheduling type of VMs to be used for workflow execution as 
well as start and finish time of each VM (provisioning). 
Placement of tasks Data transfer start and end time of 
scheduled tasks, but also the data transfers to the first 
scheduled task and from the last scheduled task. Task 
replication virtual machines to be ready to receive data and 
tasks in the moment that they are required to meet times 
estimated during the scheduling process. 
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V. CONCLUSION 

Scientific workflows present a set of characteristics that make 
them suitable for execution in Cloud infrastructures, which 
offer on-demand scalability that allows resources to be 
increased and decreased to adapt to the demand of 
applications. However, public Clouds experience variance in 
actual performance delivered by resources. Thus, two 
resources with the same characteristics may have different 
performance in a given time, what results in variation in the 
execution time of tasks that may lead to delays in the 
workflow execution. To reduce the impact of performance 
variation of public Cloud resources in the deadlines of 
workflows, we proposed a new algorithm, called EIPR, which 
takes into consideration the behavior of Cloud resources 
during the scheduling process and also applies replication of 
tasks to increase the chance of meeting application deadlines. 
Experiments using four well-known scientific workflow 
applications showed that the EIPR algorithm increases the 
chance of deadlines being met and reduces the total execution 
time of workflows as the budget available for replication 
increases. 


References 

[1] Abbott R.and Garcia-Molina H.(1988)‘Scheduling Real-Time 

Transactions,’ ACM SIGMOD Rec., vol. 17, no. 1, pp. 71-81. 

[2] Abrishami S,et al. (2013) ‘Deadline- ConstrainedWorkflow Scheduling 

Algorithms forlaaS Clouds,’ Future Gener. Comput. Syst., vol. 29, no. 
l.pp. 158-169. 

[3] Buyya J. et al. (2008)‘Workflow Scheduling Algorithms for Grid 

Computing,”in Metaheuristics for Scheduling in Distributed 
Computing Environments, F. Xhafa and A.Abraham, Eds. New York, 
NY, USA: Springer-Verlag. 

[4] Calheiros.R.N et al,(2011) ‘CloudSim: A Toolkit for Modeling and 

Simulation of Cloud Computing Environments,’ Softw., 
Pract.Exper.,vol.41, no.l.pp. 23-50. 

[5] Chen W. et al, (2012)‘WorkflowSim: A Toolkit for Simulating Scientific 

Workflows in Distributed Environments,’in Proc. 8th Int’l Conf. 
E-Science, pp. 1-8. 

[6] Chevochot P. and Puaut I. ‘Scheduling Fault-Tolerant Distributed Hard 

Real-Time Tasks Independently of the Replication Strategies,’ in Proc. 
6th Int’l Conf. RTCSA, 1999, p. 356. 

[7] Cime W, et al. (2007) ‘On the Efficacy, Efficiency and Emergent 

Behavior of Task Replication in Large Distributed Systems,’ Parallel 
Comput., vol. 33, no. 3, pp. 213-234. 

[8] Dobber. M et al,(2009) ‘Dynamic Load Balancing and Job Replication in 

a Global-Scale Grid Environment:A Comparison,’ IEEE Trans. 
Parallel Distrib. Syst., vol. 20,no. 2, pp. 207-218. 

[9] Hirales-Carbajal A, et al. (2012) ‘Multiple Workflow Scheduling 

Strategies with User Run Time Estimates on a Grid,” J. Grid Comput., 
vol. 10, no. 2, pp. 325-346. 


63 


www.erpublication.org 


